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ABSTRACT 

Minimizing the scatter between cluster mass and accessible observables is an important goal for cluster cos- 
mology. In this -work, -we introduce a new matched filter richness estimator, and test its performance using 
the maxBCG cluster catalog. Our new estimator significantly reduces the variance in the L^-richness rela- 
tion, from af^^^ = (0.86 ± 0.02)^ to af^^^ = (0.69 ± 0.02)^. Relative to the maxBCG richness estimate, it also 
removes the strong redshift dependence of the richness scaling relations, and is significantly more robust to 
photometric and redshift errors. These improvements are largely due to our more sophisticated treatment of 
galaxy color data. We also demonstrate the scatter in the L^-richness relation depends on the aperture used 
to estimate cluster richness, and introduce a novel approach for optimizing said aperture which can be easily 
generalized to other mass tracers. 

Subject headings: galaxies: clusters - X-rays: galaxies: clusters 
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1. INTRODUCTION 

The dependence of the halo mass function on cosmology 
is a problem that is well understood both analytically (Press 
& Schechter 1974; Bond et al. 1991; Sheth & Tormen 2002) 
and numerically (Jenkins et al. 2001; Warren et al. 2006; Tin- 
ker et al. 2008). In principle, this detailed understanding al- 
lows one to place tight constraints on the amplitude of the pri- 
mordial power spectrum and on dark energy parameters (e.g. 
Holder et al. 2001; Haimanet al. 2001). In practice, life is not 
so simple. Cluster mass is not an observable, and so we must 
rely on other quantities that trace mass to estimate the halo 
mass function. In this context, observables that are tightly 
correlated with mass and whose scatter is well understood are 
highly desirable, as they permit a more accurate measurement 
of the mass function. 

One such mass tracer, and the subject of interest for this 
work, is the so called cluster richness, a measure of the galaxy 
content of a cluster Relative to other popular mass tracers 
such as X-ray properties, SZ-decrements, and galaxy velocity 
dispersion, optical richness has unique advantages and disad- 
vantages. Its unique advantages are: 
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1 . cluster richness can be easily estimated with inexpen- 
sive, photometric optical data. 

2. cluster richness can be estimated for both massive clus- 
ters and low mass groups. 

The first of these two properties is significant because it im- 
plies that cluster richness estimates are readily available given 
any large, photometric optical survey such as the SDSS (York 
et al. 2000a) , DES'^ or LSST'"*. The latter property, on the 
other hand, is an important advantage for a much more inter- 
esting reason. 

Beginning with White et al. (1993), cosmological con- 
straints from galaxy clusters have been presented as a degen- 
eracy relation crs^^m = constant where 7 sa 0.5, erg is a param- 
eter specifying the amplitude of the primordial power spec- 
trum, and 0,„ is the matter density of the universe in units of 
the critical density. The existence of this degeneracy is easy 
to explain (Rozo et al. 2004): suppose that we only measured 
the abundance of galaxy clusters at a single mass scale. Since 
the halo mass function depends on both erg and il„,, it is evi- 
dent that with just one observable there must be a degeneracy 
between these two parameters. But what if we measure the 
halo mass function over a range of scales? This is roughly 
equivalent to measuring the amplitude and slope of the halo 
mass function at the statistical pivot point. If the mass range 
probed is small, then the slope of the mass function is not well 
constrained, and the degeneracy between and fl,,, will re- 
main. In order to break this degeneracy, a measurement of the 
halo mass function over a large range of masses is necessary. 
Currently, only spectroscopic velocity measurements and op- 
tical richness estimates can probe a mass range wide enough 
to successfully break this degeneracy, but the former requires 
considerably more observing resources. 

There are, however, important disadvantages to using clus- 
ter richness as a mass tracer. For instance, historically, the 
fact that the relation between cluster richness and mass can- 
not be predicted a priori based on simple physical arguments 
was viewed as a significant drawback. Nowadays, however, 

" http://www.darkenergysurvey.org/ 
''^ http://www.lsst. orgAsst_home.shtml 
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this argument holds little sway, since the level of accuracy re- 
quired for precision comsology in our a priori knowledge of 
cluster scaling relations is pushing current research towards a 
self-calibrating approach, in which both cosmology and clus- 
ter scaling relations are simultaneously constrained from the 
data (Lima & Hu 2004; Majumdar & Mohr 2004; Lima & Hu 
2005; Hu & Cohn 2006; Wu et al. 2008). Thus, in so far as 
self-calibration is necessary to insure one-self against possible 
biases in cosmological estimates, the lack of a simple physi- 
cal model for predicting cluster richness is no longer a serious 
drawback. 

Another reason why optical richness estimates fell out of 
favor relative to other mass tracers is that, in the past, rich- 
ness estimates were known to suffer from significant projec- 
tion effects, which resulted in impure cluster samples as well 
as large scatter in the mass-richness relation. Abell made one 
of the first systematic attempts at measuring richness (Abell 
1958; Abell et al. 1989) in defining his richness classes. He 
tried to minimize projection by only counting galaxies dim- 
mer than m3, the magnitude of the third brightest cluster 
galaxy, but brighter than + 2. The bright cut is aimed at 
foreground interlopers, while the dim cut reduces the contri- 
bution of the galaxy background. Later methods used simi- 
lar counting techniques but included a proper account of the 
background (e.g. Bahcall 1981) . Since then, more sophisti- 
cated algorithms have been developed and applied to CCD- 
based imaging (e.g. matched-filter methods Postman et al. 
1996; Bramel et al. 2000; Yee & Lopez-Cruz 1999; Kochanek 
et al. 2003; Dong et al. 2008). 

Projection effects are now a much more benign problem 
thanks to these more sophisticated richness measurement 
techniques, the advent of accurate photometric data enabled 
by modern CCDs, and most recently, the well-known obser- 
vations that ellipticals and cluster E/SO galaxies in particular 
tend to form a tight ridgeline in color-magnitude space (Vis- 
vanathan & Sandage 1977; Bower et al. 1992; Gladders & Yee 
2000; Koester et al. 2007b). This color clustering has been 
integral to richness measurements in the SDSS (Goto et al. 
2002; Miller et al. 2005; Koester et al. 2007b) and the Red 
Sequence Cluster Survey (RCS: Gladders & Yee 2005), and 
such color-based measures have been shown to be effective 
mass tracers (Yee & Ellingson 2003; Muzzin et al. 2007; Shel- 
don et al. 2007a; Johnston et al. 2007; Rykoff et al. 2008b; 
Becker et al. 2007a). 

While richness estimates show a strong correlation with 
other mass proxies (e.g. Yee & Ellingson 2003; Dai et al. 
2007; Sheldon et al. 2007a; Johnston et al. 2007; Becker et al. 
2007a; Rykoff et al. 2008a), considerable scatter in the mass- 
richness relation still remains. For instance, the richness mea- 
sure used in the RCS cluster catalog has a logarithmic scatter 
of (J\nM ~ 0.8 (Gladders et al. 2007), while for maxBCG clus- 
ters the number is closer to crinM ~ 0.5 (Rozo et al. 2008). This 
is to be compared to the scatter for X-ray mass tracers, which 
is expected to be as low as w 8% for Yx based on simulations 
(Kravtsov et al. 2006), or as high as « 25% for non-core ex- 
tracted soft X-ray band luminosities (e.g. Stanek et al. 2006; 
Vikhlinin et al. 2008a). Clearly, much improvement is needed 
to bring the scatter of richness measures to the level of X-ray 
mass tracers. 

This work is aimed at reducing the variance in the richness- 
mass relation. We do this by explicitly constructing a new 
richness estimator that significantly reduces the scatter in 
mass at fixed richness for maxBCG clusters. Relative to N200 
of maxBCG, we introduce two significant differences. The 



first of these involves using a matched filter algorithm to es- 
timate cluster richness. Matched filters have been used in the 
literature before (Postman et al. 1996; Kochanek et al. 2003). 
Unlike those works, however, our matched filter includes a 
color component, which is of critical importance for reduc- 
ing projection effects over the redshift range spanned by our 
cluster sample.'^ In that sense, our filter is closer in spirit to 
that of Dong et al. (2008), who include a photometric red- 
shift filter into their richness estimate. We also note here that 
group-scale studies suggest that some measure of the average 
color in the cluster is indicative of mass, particularly below 
- 1O"*M0 (Martinez et al. 2002; Martinez & Muriel 2006; 
Weinmann et al. 2006; Hansen et al. 2007) . 

The second difference we introduce is the way in which 
the aperture used to estimate cluster richness is determined. 
Generically, cluster richness estimators involve counting the 
number of galaxies within some specified aperture, which can 
thus be interpreted as defining the "size" of the cluster This 
begs the question, then, of how is one to select the correct 
size of a cluster a priori? Theoretically, halo sizes are usu- 
ally defined in terms of R^, a radius which encompasses a 
mean density that is A times either the mean or the criti- 
cal density of the universe (conventions vary from author to 
author). Unfortunately, not only is such a definition not ap- 
plicable observationally, authors vary both on the reference 
background density (critical versus mean mass density), and 
on the specific overdensity value. Thus, even though signifi- 
cant progress has been made (Cuesta et al. 2008), a definitive 
definition of halo size remains elusive. 

In this work, we approach this question with observations 
in mind. That is, rather than coming with a preconceived no- 
tion of what the radius of a cluster is, we let the data tell us 
what the optimal radii for our clusters is by demanding that 
optical richness be as tightly correlated as possible with X-ray 
luminosity. The idea is as follows: first, one posits a scaling 
relation between cluster richness and cluster radius. When es- 
timating cluster richness, one then demands that the richness- 
radius scaUng relation be satisfied. For instance, given a clus- 
ter, one can simply make an initial guess for its richness. Us- 
ing the richness-radius scaling relation, one can then draw 
a circle of the appropriate radius, and count the number of 
galaxies within it. If the richness was underestimated, one 
will find too many galaxies, signaling that the richness es- 
timate must be increased. Proceeding in this way, one can 
quickly zero in on the appropriate richness for the object. 

This does, however, leave open the question of what the 
correct richness-radius relation is. Since we are interested in 
finding a new richness estimator that is tightly correlated with 
halo mass, we can use the scatter in the mass-richness rela- 
tion as our figure of merit to determine the "correct" richness- 
radius relation. In practice, we use the Lx-richness scatter 
rather than the mass-richness scatter because the scatter in 
mass is not directly observable. We emphasize that since the 
mass scatter at fixed X-ray luminosity (see e.g. VikhUnin et al. 
2008b) is considerably tighter than the corresponding scatter 
at fixed richness (Rozo et al. 2008), the use of X-ray luminos- 
ity as a mass tracer for our purposes is well justified. 

The layout of the paper is as follows. We describe the data 
sets used in this work in § 2. Our matched filter estimator is 
introduced in § 3, followed by our method for determining the 

In Kochanek et al. (2003), the low redshift of the clusters make single 
band magnitudes better proxies for distance than colors, so the lack of a color 
filter in the richness estimator is less important for their work. 
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optimal radius-richness relation in § 4. We present our results 
in section § 5. In investigating the properties of our new rich- 
ness measure, we have discovered that the redshift evolution 
of the richness-mass relation of our new estimator is much 
more mild than that measured for A^2oo- These results and the 
corresponding discussion are presented in § 6. We summarize 
our results and present our conclusions in § 7. Throughout, 
whenever needed a flat ACDM cosmology with ^Im = 0.3 and 
h=l.Q was assumed. 

2. DATA 

The data for the analysis presented in this work comes from 
two large area surveys, the Sloan Digital Sky Survey (SDSS; 
York et al. 2000b) and the ROSAT All-Sky Survey (RASS: 
Voges et al. 1999). SDSS imaging data are used to select 
clusters and to measure their matched filter richness; RASS 
data provide 0.1-2.4 keV X-ray fluxes, which we convert into 
estimates of the X-ray luminosity of the clusters. 

2.1. SDSS 

The imaging and spectroscopic surveys that comprise the 
SDSS are currently in the sixth Data Release (Adelman- 
McCarthy et al. 2008). This release includes nearly 8500 
square degrees of drift-scan imaging in the the Northern 
Galactic Cap, and another 7500 square degrees of spectro- 
scopic observations of stars, galaxies, and quasars. 

The camera design (Gunn et al. 2006) and drift-scan imag- 
ing strategy of the SDSS enable acquisition of nearly simul- 
taneous observations in the u,g,r,i,z filter system (Fukugita 
et al. 1996). Calibration (Hogg et al. 2001; Smith et al. 2002; 
Tucker et al. 2006), astrometric (Pier et al. 2003), and pho- 
tometric (Lupton et al. 2001) pipelines reduce the data into 
object catalogs containing a host of measured parameters for 
each object. 

The maxBCG cluster sample and the galaxy catalogs used 
to remeasure cluster richness in this paper are derived from 
the SDSS. The galaxy catalogs are drawn from an area ap- 
proximately coincident with DR4 (Adelman-McCarthy et al. 
2006). Galaxies are selected from SDSS object catalogs as 
described in (Sheldon et al. 2007b). In this work we use 
CMDDEL_COUNTS as our total magnitudes, and MODEL_COUNTS 
when computing colors. Bright stars, survey edges and re- 
gions of poor seeing are masked as previously described 
(Koester et al. 2007a; Sheldon et al. 2007b). 

2.2. Cluster Sample 

We obtain sky locations, redshift estimates, and initial rich- 
ness values from the maxBCG cluster catalog. Details of 
the selection algorithm and catalog properties are published 
elsewhere (Koester et al. 2007a,b). In brief, maxBCG selec- 
tion relies on the observation that the galaxy population of 
rich clusters is dominated by luminous, red galaxies clustered 
tightly in color (the E/SO ridgeline). Since these galaxies have 
old, passively evolving stellar populations, their g- r color 
closely reflects their redshift. The brightest such red galaxy, 
typically located at the peak of the galaxy density, defines the 
cluster center 

The maxBCG catalog is approximately volume limited in 
the redshift range 0.1 < z < 0.3, with very accurate photo- 
metric redshifts (6z ^ 0.01). Studies of the maxBCG algo- 
rithm applied to mock SDSS catalogs indicate that the com- 
pleteness and purity are very high, above 90% (Koester et al. 
2007a; Rozo et al. 2007). The maxBCG catalog has been 
used to investigate the scaling of galaxy velocity dispersion 



with cluster richness (Becker et al. 2007b) and to derive con- 
straints on the power spectrum normalization, a^, from cluster 
number counts (Rozo et al. 2007). 

The primary richness estimator used in the maxBCG cata- 
log is A^200, defined as the number of galaxies with g-r colors 
within 2(7 of the E/SO ridgeline as defined by the BCG color, 
brighter than 0.4L, (in /-band), and found within rf^Q of the 
cluster center rf^ is a cluster radius that depends upon the 
number of galaxies within a fixed aperture 1 Mpc of the 
BCG, labeled Ngais, with the relation fi^^iNgais) being cali- 
brated so that, on average, the galaxy overdensity within ff^ 
is 200il~' assuming il„, = 0.3 (Hansen et al. 2005). The full 
catalog comprises 13,823 objects with a richness threshold 
A^200 > 10, coiTesponding to M > 5 • 10'^^ /j"' Mq (Johnston 
et al. 2007). 

As mentioned in the introduction, we re-estimate the cluster 
richness for every object in the maxBCG catalog, and measure 
the corresponding scatter in the Lx-richness relation. When 
doing so, we always limit ourself to the 2000 richest clusters, 
ranked according to the new richness estimate. This cut is 
made to ensure that our results are insensitive to the A^200 > 10 
cut of the maxBCG catalog. That is, the number of clusters 
with A^20() > 10 that fall within the 2000 richest clusters for any 
of the new richness measures considered has no impact on the 
recovered scatter We also note that our choice of always se- 
lecting the 2000 richest clusters also implies that the specific 
cluster sample used to estimate the scatter in the L^-richness 
relation varies somewhat as we vary the richness estimator 

2.3. X-ray Measurements 

The scatter in Lx at fixed richness is estimated using a 
slight variant of the method presented in Rykoff et al. (2008b). 
Briefly, we use the RASS photon maps to estimate the 0.5- 
2.0 keV X-ray flux at the location of each cluster, which is 
used to derive Lx [0.1-2.4 keV] using the cluster photomet- 
ric redshift (the conversion factors are similar to those used 
in Bohringer et al. 2004). We then perform a Bayesian lin- 
ear least squares fit to InLj as a function of InA^, where is 
the richness parameter to be tested. The variance in InLx is 
included as a free parameter The fit is done following the 
algorithm presented in Kelly (2007), and correctly takes into 
account upper limits for Lx for those clusters with upper lim- 
its on X-ray emission. 

It is important to note here that the estimated X-ray lumi- 
nosity of a cluster depends on the aperture used to measure 
Lx- Rykoff et al. (2008b) used a fixed 750 /j"' kpc aperture as 
a compromise between needing a large aperture to avoid los- 
ing X-ray photons due to the ROSAT PSF and cluster miscen- 
tering, and the need for a small aperture in order to increase 
the signal to noise of the cluster emission. Further work has 
shown that the scatter in Lx at fixed A^200 is minimized when 
using an aperture of 1 /i"' Mpc. The corresponding scatter for 
the top 2000 maxBCG clusters is a^nix 1^200 = 0-96 ± 0.03. 

The nature of the present exercise has the benefit of as- 
signing a cluster radius Rc, to each individual cluster, so it 
is natural to measure Lx in the same scale as the optical rich- 

The attentive reader will note that the quoted scatter in Lx at fixed rich- 
ness is significantly larger than the scatter in mass at fixed richness quoted 
in the introduction, which was closer to 0.5. Given a slope of 1.6 in the 
Lx—M relation, a scatter of 0.96 in Lx corresponds to Ri 0.96/1.6 ~ 0.6 
scatter in mass. The remaining 10% difference is because the scatter in Rozo 
et al. (2008) uses the scatter of the 1000 richest clusters, which is smaller than 
that of the 2000 richest clusters by 0. 1 . 
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ness. Thus, in this work, we estimate Lx using a variable 
aperture which depends upon the cluster's richness. Using a 
fixed 1 /i"' Mpc aperture to estimate Lx does not have a large 
effect on our results, for reasons that will be discussed be- 
low. Finally, we note that very small physical apertures are 
impractical for the most distant clusters due to the large size 
of the RASS PSF, which corresponds to a physical scale of 
300/2"' kpc (FWHM) at z = 0.23, the median redshift of the 
maxBCG catalog. Therefore, we place a fixed minimum aper- 
ture of 500/!~'kpc for each cluster We discuss the small ef- 
fect of this aperture cutoff in § 4. 

2.4. Cleaning the Sample 

Our analysis depends on a combination of optical and X-ray 
measurements of maxBCG clusters using SDSS and RASS 
data. As discussed in detail in Rykoff et al. (2008b, see § 5.6), 
there is clear evidence that cool core clusters increase the scat- 
ter in X-ray cluster properties. High resolution X-ray imaging 
of clusters allows the exclusion of cluster cores, reducing the 
scatter in observed X-ray properties (e.g. O'Hara et al. 2006; 
Chen et al. 2007; Maughan 2007). Unfortunately, the broad 
PSF of RASS means that it is impossible to exclude the cores 
of clusters in this work. In order to asses how robust our re- 
sults are to the presence or absence of cooling flow clusters 
in the cluster sample, we have created a "clean" sample of 
maxBCG clusters by removing all known cool core clusters 
that might have boosted global X-ray luminosity and may sig- 
nificantly bias our results. In addition, we have removed ap- 
parently X-ray bright maxBCG clusters that were determined 
via inspection to have their X-ray flux significantly contami- 
nated by foreground objects such as stars, low redshift galaxy 
clusters, and AGN. 

There does not exist a complete, unbiased catalog of cool 
core X-ray clusters. The presently described cleaning pro- 
cedure is not intended to be complete, and is intended only 
to give some sense of the robustness of our results to the 
presence of cooling flow clusters. Following Rykoff et al. 
(2008b), we have assembled all the known cool core clus- 
ters from the literature. This includes: A750, A1835, 
Z2701 , Z3 146, Z7 160, RXC 2 1 29.6H-0005 (Bauer et al. 2005), 
A1413 (Chen et al. 2007), A2244 (Peres et al. 1998), and 
RXC J1504. 1-0248 (Bohringer et al. 2005). From here on, 
the maxBCG catalog presented in Koester et al. (2007b) is 
referred to as the "full" cluster sample, and the subsample de- 
scribed above is referred to as the "clean" cluster sample. 

3. MATCHED FILTER RICHNESS ESTIMATORS 

3.1. Derivation of the Matched Filter Richness Estimator 

Let X be a vector characterizing the observable properties 
of a galaxy (e.g. galaxy color and magnitude). We model 
the projected galaxy distribution around clusters as a sum 
5(x) = Am(x| A) + /7(x) where A is the number of cluster galax- 
ies, m(x|A) is the cluster's galaxy density profile normalized 
to unity, and b{x) is density of background (i.e. non-member) 
galaxies. The probability that a galaxy found near a cluster is 
actually a cluster member is given by 

Am(x|A) 



p{x)-. 



(1) 



Am(x|A) + Mx) 

Consequently, the total number of cluster galaxies A must sat 
isfy the constraint equation 

^ Am(x|A) 



where the sum is over all galaxies in the cluster field. If the fil- 
ters m(x| A) and bin) are known, then given an observed galaxy 
distribution {xi, ...,XAr} around a cluster we can define a rich- 
ness estimator A as the solution to equation 2. As it turns out, 
one can also derive this expression using a maximum likeli- 
hood approach. Interested readers are referred to appendix A 
for details. From now on, the letter A shall always refer to a 
matched filter richness estimate obtained with equation 2. 

3.2. Cluster Radii and Matched Filter Richness Estimates 

Consider again Eqn. 2. As mentioned before, the sum used 
in Eqn. 2 needs to extend over all galaxies. In practice, of 
course, one needs to add over all galaxies within some cutoff 
radius Re- Operationally, this is equivalent to setting m = 
for all galaxies with radii R > Rc, so it is natural to interpret 
the cutoff radius Rc as a cluster radius. In this light, it seems 
obvious that considerable care must be taken to choose the 
correct cluster radius when estimating richness, but how to go 
about doing just that is a less straightforward question. 

In this work, we propose that cluster radii be selected on 
the basis of a model radius-richness relation. Specifically, we 
assume that the size of a cluster of richness A scales as a power 
law of A, 

Rc(X) = Ro(\/lOO.Or. (3) 

Naively, we expect /?o ~ 1 Mpc, as that is the characteristic 
size of clusters, and a~ 1/3 assuming that R oc M^^^ cx A'/"*. 
We postpone the discussion of how we go about selecting Ro 
and a to section 4. For the time being, we shall simply assume 
that /?() and a are known. In that case, equation 2 becomes 



A = ^p(x|A)= J2 



R<R,(A) 



Am(x|A) 
Xu(x\X) + b{x)' 



(4) 



Note that we have explicitly included the cutoff radius R^ in 
the sum above, and that this cutoff radius now depends on A. 
Moreover, one can see that in the above equation, the cluster 
richness A is the only unknown, so we can numerically solve 
for A. In other words, by positing a richness-radius relation 
we are able to simultaneously estimate both a cluster radius 
and the corresponding cluster richness. 

3.3. The Filters 

In this work we consider three observable properties of 
galaxies: R, the projected distance from a galaxy to the as- 
signed cluster center, m, the galaxy magnitude, and c, the 
galaxies' g-r color We adopt a separable filter function 



m(x) = [2TrREiR)](Pim)Gic) 



(5) 



A = ^p(x|A) = 



^ Xu{x\X) + b{x) 



(2) 



where T,(R) is the two dimensional cluster galaxy density pro- 
file, (j)(m) is the cluster luminosity function (expressed in ap- 
parent magnitudes), and G(c) is color distribution of cluster 
galaxies. The prefactor IttR in front of T^{R) accounts for the 
fact that given T^{R), the radial probability density distribu- 
tion is given by InR'SiR). Also, note the separability condi- 
tion makes the implicit assumption that these three quantities 
are fully independent of each other, which is not true in detail 
(for a discussion of the galaxy population of maxBCG clusters 
see Hansen et al. 2007). For instance, the tilt of the ridgeline 

Note that since we are explicitly setting w = for R> R^, the fact that u 
must be normahzed to unity necessarily introduces a dependence of u on A. 
That is, changing A will not only change the range of the sum in equation 4, 
it will also change the value of the summands. 
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implies that the mean color of a red sequence cluster galaxy 
varies slightly as a function of magnitude. We postpone an 
investigation of how including the correlation between these 
various observables affects our conclusions to future work 
(Koester et al, in preparation). We now describe each of our 
three filters in detail. We note that defining said filters requires 
us to specify parameters governing the shape of the filters (e.g. 
Rs for the radial filter, a for the luminosity filter, etc.). A de- 
tailed study on the dependence of our matched filter richness 
estimates on the shape of our filters will be presented in future 
work. 

3.3.1. The Radial Filter 

N-body simulations show that the matter distribution of 
massive halos can be well described by the so called NFW 
profile (see e.g. Navarro et al. 1995, 1997), 

{r/r,)(l + r/r,y 

where r, is characteristic scale radius at which the logarithmic 
slope of the density profile is equal to -2. The corresponding 
two dimensional surface density profile (Bartelmann 1996) is 

(X J (R/Rs) (7) 



(R/R,,)^-r 



where R. = r, and 



/W = l- 



: tan 



1 



x+ 1 



(8) 



This formula assumes x > 1 . For x < 1 , one uses the identity 
tan"'(/x) = !tanh(x). 

Here, we assume that the NFW profile can also reasonably 
describe the density distribution of galaxies in clusters (Lin 
& Mohr 2004; Hansen et al. 2005; Popesso et al. 2007), and 
follow Koester et al. (2007a) in setting Rs = 150 /!"'kpc. In 
principle, one could optimize the value of this parameter, but 
we do not expect our final results to be overly sensitive to 
our chosen value (see e.g. Dong et al. 2008). Also, in order to 
avoid the singularity at /? = in the above expression, we set S 
to a constant for R < Rcre = 100 kpc. This core density is 
chosen so that the mass distribution T,{R) is continuous. Our 
results are insensitive to the particular choice of core radius 
for Reore < 200 /i"' kpc. Finally, the profile S(/?) is truncated 
at the cluster radius RcW, and is normalized such that 



1 : 



R,(A) 



dR 2TrRY.(R). 



(9) 



We emphasize that this condition implies that the normaliza- 
tion constant for the density profile is richness dependent, and 
must be recomputed for each A value when solving for A in 
equation 4. 

3.3.2. The Luminosity Filter 

At z < 0.3, the luminosity distribution of satellite clus- 
ter galaxies is well-represented by a Schechter function (e.g 
Hansen et al. 2007) which we write as 

0(m) = 0.41n(10)(/i* lO-"'^*'"-'"*'*"^'' exp (-io-""("'-'«*)) (10) 

We take a = 0.8 independent of redshift. The characteris- 
tic magnitude, m*, is corrected for the distance modulus, k- 
corrected, and passively-evolved using stellar population syn- 
thesis models described in Koester et al. (2007b). When ap- 
plying the luminosity filter, is chosen from these models. 



appropriate to the redshift of the cluster under consideration, 
and the filter is normalized by integrating down to a magni- 
tude corresponding to 0.4L* at the cluster redshift, or an abso- 
lute magnitude M, = -20.25. The latter is simply a luminosity 
cut bright enough to make the maxBCG sample volume lim- 
ited. 

3.3.3. The Color Filter 

Early type galaxies are known to dominate the inner re- 
gions of low redshift galaxy clusters (see e.g. Dressier 1984; 
Kormendy & Djorgovski 1989; Hansen et al. 2007). The 
rest-frame spectra of these galaxies typically exhibit a signif- 
icant drop at about 4000 A, that gives early type galaxies at 
the same redshift nearly uniformly red colors when observed 
through filters that encompass this break. In the SDSS sur- 
vey, the corresponding filters for galaxies at z < 0.35 are g 
and r, and we find that the g-r colors of early type galaxies 
are found to be gaussianly distributed with a small intrinsic 
dispersion of about 0.05 magnitudes. Consequently, we take 
the color filter G(c) to be 

(C-(C|Z))2T 



G(c|z) = 



1 



exp 



2a2 



(11) 



where c = g-r is the color of interest, (c|z) is the mean of 
the Gaussian color distribution of early type galaxies at red- 
shift z, and (T is the width of the distribution. The mean color 
(c|z) = 0.625 + 3. 149z was determined by matching maxBCG 
cluster members to the SDSS LRG (Eisenstein et al. 2001) 
and MAIN (Strauss et al. 2002) spectroscopic galaxy samples. 
The net dispersion a is taken to be the sum in quadrature of 
the intrinsic color dispersion tr,,,,, set to t7,„, = 0.05, and the es- 
timated photometric error cr,„. In g-r, the typical photometric 
error on the red-sequence cluster galaxies brighter than 0.4L* 
is <7„, w 0.01 magnitudes for z = 0. 1, but can be as as large as 
a„, w 0.05 magnitudes for z = 0.3. 

3.3.4. Background Estimation 

To fully specify our filters, we also need to describe our 
background model. We assume the backgroujid galaxy den- 
sity is constant in space, so that b{\) = 27r/?S]g(m,-,c) where 
Eg(m, , c) is the galaxy density as a function of galaxy /-band 
magnitude and g-r color Y.g{mi,c) is estimated by distribut- 
ing 10^ random points throughout the same SDSS photomet- 
ric survey footprint that defines our galaxy sample. All galax- 
ies within an angular separation of 0.05 degrees of the ran- 
dom points (about l/r' Mpc at z = 0.25) are used to empir- 
ically determine the mean galaxy density E^(ot,, c) using a 
top hat cloud-in-cells (CIC) algorithm (e.g. Hockney & East- 
wood 1981). For our cells, we used 60 evenly-spaced bins in 
g-rE [0,2] and 40 bins in ; G [14,20]. In each 2 dimensional 
bin, the number density of galaxies is normalized by the total 
number of random points, the width of each color and mag- 
nitude bin (0.05 mags and 0. 1 mags, respectively), and area 
searched (0.05^7r degrees). 

This process creates an estimate of the global background, 
i.e. the number density of galaxies as a function of color and 
magnitude in the full SDSS survey. Not surprisingly, a sim- 
ilar result is obtained by binning the whole galaxy catalog 
in color and magnitude with CIC and dividing by the survey 
area. However, the procedure we employ above can readily 
be adapted to returning alternative background estimates, e.g 
the local cluster density as a function of redshift, by replacing 
random points with clusters. 
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4. METHODS 

We have now fully specified our richness estimators, except 
for the values Rq and a that govern the radius-richness scaling 
relation. We now discuss how we go about selecting optimal 
values for these parameters. 

As we mentioned earlier, we wish to find the cluster rich- 
ness estimator that minimizes the scatter in the richness-mass 
relation. Cluster mass, however, is not an observable, and thus 
we must rely on other mass tracers. Here, we use X-ray lumi- 
nosity (Lx) as our mass proxy, primarily because it is a well 
known mass tracer (e.g. Reiprich & Bohringer 2002; Stanek 
et al. 2006; Rykoff et al. 2008a) that is readily accessible to us 
and for which we can quickly estimate the scatter for multiple 
richness measures (see Rykoff et al. 2008b). 

We proceed as follows: we begin by defining a coarse grid 
in /?() and a, given by 

7?,) = {0.5,0.75, 1.0, 1.25, 1.5} (12) 
a= {-0.05,0.05,0.15,0.25,0.35,0.45} (13) 

where Rq is measured in units of /j"' Mpc. Each of these grid 
points defines a distinct richness estimator through equation 
4. For each grid point, we estimate the corresponding richness 
for every cluster in the maxBCG catalog. We then select the 
2000 richest clusters and calculate the scatter in Lx at fixed 
richness of those top 2000 clusters. Note that, because the 
rank ordering of the clusters changes as we vary our richness 
estimate, the clusters used to estimate the scatter in Lx varies 
slightly across the grid. We limit ourselves to the richest 2000 
clusters to ensure our results are insensitive to the N2m > 10 
cut in the maxBCG catalog. 

From our measurements of the scatter crinLv|A(^{): ct) at each 
grid point, we can directly read which parameter combination 
minimizes the scatter We emphasize that because the scat- 
ter in mass at fixed Lx is much lower than the corresponding 
scatter at fixed richness (Rozo et al. 2008), for our purposes 
Lx is a nearly perfect mass tracer We note that the X-ray 
measurements described in 2.3 require a minimum aperture of 
500/!~'kpc. For the 2000 richest clusters, this cutoff is only 
employed when /?o = 0.5/;"' Mpc and a > 0.15, which is a re- 
gion of parameter space that already does not appear to have 
a strong correlation between Lx and richness. Therefore, we 
conclude that the aperture cutoff does not have a significant 
effect on our results. 

To determine the uncertainty in the recovered parameters Rq 
and a, we need to understand the errors in our measurement of 
the Lx-richness scatter. We estimate these errors using boot- 
strap resampling. We proceed as follows: let fi be an index 
that runs over all grid points (7?o,a), and ct^ be the scatter at 
the fi''^ grid point. We resample (with replacement) the full 
maxBCG catalog, and measure the scatter at every grid 
point. The procedure is iterated 100 times, and the measure- 
ments are used to estimate the mean and covariance matrix 
of cTp."* Assuming that the probability distribution P(ct^) is 
a multi-variate Gaussian characterized by the observed mean 
and covariance matrix, we generate 10^ Monte Carlo realiza- 
tions of the scatter, and estimate the fraction of times that each 
grid point is observed to have the lowest scatter among all grid 
points. 

To use the grid to zero in on a particular value for Rq and a, 
and to estimate errors in these values, we fit each of the 10^ 

The measurement of the scatter in Lx at fixed richness is very time 
consuming, and needs to be done independently for every point in the grid. 
This explains why we restrict ourselves to only 100 bootstrap resamplings. 



realizations of the scatter a\„i^^\x{RQ,a) with a 2D parabola. 
From the fits, we can read off the values of Rq and a at which 
the minimum occurs, giving us lO*" samplings of the probabil- 
ity distribution of the location of the minimum in parameter 
space. The probability distribution of the resulting 10^ min- 
ima is exactly what we desired. 

As it turns out, and as discussed in § 5, the coarse grid de- 
fined above is too broad for a parabolic fit to adequately de- 
scribe the function cr\nLx\x(^o,oi). However, if we restrict our- 
selves to a smaller region of parameter space near the mini- 
mum determined from the coarse grid, a quadratic fit becomes 
adequate. Therefore, we have defined a narrower fine grid, 

/?o = {1.0,l-l, 1.2,1.3, 1.4} (14) 
a = {0.22,0.26,0.30,0.34,0.38,0.42} (15) 

with 7?o measured in units of /j"' Mpc. It is this grid that we 
use to report our final results and to select the optimal param- 
eters and a. 

To summarize, we first do a rough exploration of the pa- 
rameter space Rq and a using a coarse grid, and then use a 
smaller but finer grid to statistically constrain the location of 
the scatter minimum. 

5. RESULTS 

5.1. The Full Sample 

Figure 1 illustrates the probability that each coarse grid 
point is found to minimize the scatter of the 2000 richest 
clusters when resampling our data as described in § 4. For 
this plot, we have used the full cluster sample, though a sim- 
ilar result holds when using the clean cluster sample. Each 
square is shaded in gray on a log scale according to the frac- 
tion of trials that point is found to have the minimum scat- 
ter. The primary feature of this plot is a broad degeneracy 
region from (Ro,a) « (0.8,0.0) to (Ra,a) w (1.4,0.5), cor- 
responding to a scatter cr|nL^|A ~ 0.78. Note this scatter is 
a significant improvement relative to the L^-richness scat- 
ter measured for A^200, cinLxi^ioo - 0-96. The scatter in Lx 
increases as we move away from the degeneracy region, rang- 
ing from CTinLxIA ~ 0.86 in the lower-right corner of Figure 1 
to cTinLvlA > 1.0 in the upper-left corner Further discussion 
of why our new richness estimator results in significantly re- 
duced scatter is presented in § 6. 

Figure 2 shows the probabiUty density of the points in 
Ro - a space that minimize the scatter in Lx at fixed richness 
for the fine grid, as estimated through the parabolic fits to the 
function (T\J^l^^xil^o,ct) described in section § 4. The solid 
contours are for the full cluster sample and the dashed con- 
tours are for the clean cluster sample. The diagonal degener- 
acy suggested in the previous plot is now very obvious, espe- 
cially in the 2cr contour. Importantly, both the full and clean 
sample produce very similar results, although the contours are 
noticeably smoother for the clean sample. We note that the 
closing of the la contours in the upper-right and lower-left is 
likely an artifact of the grid boundaries. As demonstrated in 
the coarse grid in Figure 1 , the degeneracy region extends at 
least to a and a ~ 0.5. 

The existence of the degeneracy region is relatively simple 
to explain. Consider the problem we are trying to address: 
what is the correct size of a cluster? Roughly speaking, this 
involves two parts: one, determining the correct cluster size 
of the average cluster, and two, determining how the cluster 
size scales with richness as one moves away from the average 
cluster. The former is much better determined than the lat- 
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Fig. 1 . — Probability that a given point in the grid minimizes the scatter in 
Lx at fixed richness in the coarse grid. The gray scale varies logarithmically 
with the probability, which is explicitly quoted in the Figure. Note the broad 
degeneracy region from (Ro,a) (0.8,0.0) to (Ro,a) ?s (1.4,0.5), where the 
scatter <J\„Lx\\ ~ 0.78. 




1.0 1.1 1.2 1.3 1.4 1.5 

Ro (h"^ Mpc) 



Fig. 2. — Contour plot of the probability density of the points in So ~ Q 
space that minimize the scatter (J|„^J,j_;^(So.c^)■ The solid contours show the 
l(j and 2a contours for the full sample, and the dashed hnes show the same 
contours for the "cleaned" sample (see § 2.4). The closing of the Icr contours 
in the upper-right and lower-left are likely an artifact of the grid. The dotted 
hne shows the contour of fixed mean cluster radius Rc = 900//"' kpc. All the 
richness estimators along this line result in the same mean cluster radius, and 
have therefore very similar richness values. 



ter, so in the iRo,a) plane, one typically expects a sharp con- 
straint on the mean cluster radius, and a considerably weaker 
constraint on the orthogonal direction, corresponding to the 
scaling of the radius with richness around the statistical pivot 
point. Thus, we expect the observed degeneracy between Ro 
and a to pick out parameter combinations that hold the me- 
dian cluster radius of the sample fixed. 

Figure 2 clearly illustrates that this is the case. In the figure, 
the diagonal dotted line corresponds to a contour of fixed me- 
dian cluster radius R{Ro, a) = 900 /r' kpc, where the function 
R{Ro,a) is defined as the median cluster radius of the 2000 
richest clusters. The fact that this contour falls almost exactly 
along the observed degeneracy between Rq and a strongly 
supports our interpretation. 

Our argument suggests a way to break the degeneracy be- 
tween Rq and a. If we can measure the scatter in Lx at fixed 
richness at two very different richness scales, then the mean 
radius picked out by each of the samples will be substantially 
different. This, in turn, rotates the degeneracy lines relative to 



each other, so that the intersection defined by the two samples 
would cleanly pick out a single value for Rq and a. 

We have repeated our analysis on the top 500 and 1000 clus- 
ters, but these thresholds are much too close to our reported 
2000 clusters to be able to successfully break the observed de- 
generacy. Ideally, we would repeat our study using the 10000 
or 20000 richest clusters, thereby guaranteeing a degeneracy 
region that is significantly rotated relative to that of Figure 
2. Unfortunately, performing our scatter analysis on the top 
10000 clusters is not presently possible since the vast majority 
of this larger cluster sample does not emit sufficiently in X- 
rays to allow for individual luminosity estimates of the clus- 
ters. Furthermore, when choosing more than the top ~ 3000 
clusters we begin to run into threshold effects due to the ini- 
tial selection of maxBCG clusters with A^200 > 10. One might 
hope instead to repeat our analysis using not the top 10000 
clusters, but rather the top 100 clusters, that is, by limiting 
ourselves to the very richest systems. Unfortunately, this suf- 
fers from a different problem: when looking at the top 100 
clusters only, the range of richnesses being sampled is much 
too narrow to allow a simultaneous estimate of the amplitude, 
slope, and scatter of the Lx -richness relation, so performing 
our analysis using the top 100 clusters only is also not fea- 
sible. Thus, at the time being, we must simply accept the 
existence of a large degeneracy between Rq and a. 

5.2. Selecting an Optimal a 

Due to the large degeneracy between Rq and a, it is difficult 
to select any single point in R^-a space as optimal. We note, 
however, that the degeneracy region goes through a= 1/3, 
which is loosely theoretically motivated based on the naive 
expectation R^ (x M (x X. Since our goal is to define a unique 
richness measure, we have opted for setting a= 1/3. Given 
that the degeneracy region goes through a= 1/3, our choice 
does not adversely affect the properties of our richness estima- 
tor That is, the scatter for a= 1/3 is indistinguishable from 
that of the best possible value for a to within observational 
uncertainties. 

Using a principal component analysis on the best-fit min- 
ima that describe the contours in Figure 2, we have calculated 
the degeneracy axis for each of the full and clean cluster sam- 
ples. For the full cluster sample we obtain 

ln(7;o/l /j"' Mpc)- 1.342(a-0.33) = 0.25 ±0.04, (16) 

while for the clean cluster sample we find 

ln(«o/l/j"' Mpc)-1.277(a-0.33) = 0.24±0.03 (17) 

We have confirmed that the residuals are Gaussian along most 
of the degeneracy axis. We quote the degeneracy line in terms 
of ln/?o and a rather than Ro and a themselves simply because 
the former results in more accurate extrapolations for a values 
that are very different form a= 1/3. 

We are encouraged by the fact that the clean and full sam- 
ples give fully consistent results, thus showing that the known 
cool core clusters and obvious foreground contamination are 
not significantly biasing the best combination of Rq and a. 
Our final choice for Rq and a is therefore R^ = 1.27 /i"' Mpc 
and a = 1/3. 

5.3. Improvement in the Scatter 

Now that we have a fully specified R^ = 1.27 /i"' Mpc and 
a = 1/3, we have measured the matched filter richness of ev- 
ery cluster in the Koester et al. (2007b) sample. Figure 3 
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100 




Table 1 . Scatter in Lx at fixed richness, top 2000 clusters 



Richness 


Full Sample 


Clean Sample 


^^200 


0.95 ±0.03 


0.86 ±0.02 




0.84 ±0.02 


0.78 ±0.02 


A 


0.79 ±0.02 


0.70 ±0.02 


A 


0.78 ±0.02 


0.69 ±0.02 



Except for the last row, Lx was measured within a fixed 1 h ' Mpc aperture. 
The scatter in Lx quoted in the last row is different only in that it measured Lx 
within the assigned optical cluster radius Sr(A). The combination N2aol^g^Q 
was suggested by Reyes et al. (2008) as an improvement overA^oo- The error 
bars define 68% confidence intervals. 
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Fig. 3. — Top panel: Lx vs. ^200 for the 3000 richest clusters. Follow- 
ing Rykoff et al. (2008b), the solid points represent > la detections, and the 
empty circles represent \a upper limits. The vertical dotted line represents 
the cutoff for the top 2000 clusters used in the analysis. The dashed lines 
represent the '^'ia^^i^i^/^^^ scatter constraints. The fictitious data point in the 
lower-right comer shows the typical Lx error. The red diamonds represent 
clusters that are excluded from the clean sample because they are obviously 
contaminated by foreground X-ray emission. The blue squares represent clus- 
ters that are excluded from the clean sample because they are known cool 
core clusters. Bottom panel: Lx vs. A for Rq = 1.27, q = 1/3 for the 3000 
richest clusters; the symbols are the same as for the top panel. Our optimized 
matched filter richness estimate A is significantly more tightly correlated with 
Lx than A'2oo- 



shows Lx vs. A^200 (top panel) and Lx vs. A (bottom panel) for 
the top 3000 richest clusters. Following Rykoff et al. (2008b), 
the solid points represent detections at the > Icr level, and 
the empty points represent Ict upper limits. The vertical dot- 
ted line represents the cutoff for the top 2000 richest clusters 
used in this analysis. Though not obviously visible in this 
plot, the scatter in A is significantly decreased. We note that 
there are still some significant outliers in the Lx- X relation, 
especially at high Lx- The red diamonds and blue squares rep- 
resent clusters that are removed from the clean cluster sample. 
The red diamonds are clusters whose measured X-ray flux is 
known to be contaminated by foreground emission from stars, 
nearby galaxy clusters, or AGN. The blue squares represent 
the known cool core clusters. These are, for the most part, 
significantly brighter than typical maxBCG clusters at similar 
richness, which is consistent with the hypothesis that the X- 
ray luminosity of these clusters is boosted by emission from 
the core. 

Table 1 summarizes how the scatter of the 2000 richest clus- 
ters varies as we change our richness measure. Here, we con- 



sider three richness measures only: A^200, which is the original 
richness estimate for maxBCG clusters presented in Koester 
et al. (2007a); A^aooificS' which was suggested by Reyes et al. 
(2008) as an improvement over A^200 by making use of Lbcg, 
the luminosity of the cluster BCG; and our optimized matched 
filter richness estimator A. We see that for both the full and 
clean sample, our optimized matched filter estimator signifi- 
cantly outperforms both A^2()() and N2()(.)L^saj- T° quantify the 
significance of the improvement, we must take into account 
the fact that the errors are correlated. Following § 4, we have 
performed bootstrap resampling on the full catalog and clean 
catalog, calculating the scatter in the top 2000 clusters for 
both A and Noqq- For each bootstrap resampling we calcu- 
late r = cr\^Lx\\l '^\nLx\N2m- ^hc deviation from r = 1.0 can be 
used to quantify the significance of the improvement. The im- 
provement in the scatter relative to A^200 is significant at 9a for 
the full cluster sample, and at 11 cr for the clean sample. 

6. REDSHIFT DEPENDENCE 

Rykoff et al. (2008b) showed that there is strong redshift 
evolution in the {Lx |A^20o) relation of maxBCG clusters. Simi- 
lar redshift dependence is observed in the velocity dispersion- 
optical richness relation measured in Becker et al. (2007a). 
This is best understood as a variation of A^200 at fixed mass, 
with an observed fractional decrease in A^200 of 30% - 40% 
over the redshift range of the maxBCG catalog. In our pre- 
vious work, the origin of this redshift dependence was un- 
clear. Here, we demonstrate how the matched-filter richness 
removes this redshift dependence, and show the pitfalls of a 
simple richness estimator such as A'2oo- 

Figure 4 shows the {Lx |A^20o) relation for maxBCG clusters 
split into three different redshift bins (solid symbols). Also 
shown is the mean relation {Lx\\) for the same three red- 
shift bins (empty symbols). It is obvious from the figure that 
the redshift evolution in the Lx-richness relation is signifi- 
cantly weaker for A than it is for A^200- We have fit the data 
with a power-law evolution in redshift, following Rykoff et al. 
(2008b, §5.3): 



(18) 



where z is the median redshift of the cluster sample and is 
the richness measure of interest. We find that 7 = 6.0±0.8 for 
A^20() while 7 = 0.7 ± 0.8 for A, consistent with no evolution. 

Note, however, that even if the relation between A and clus- 
ter mass is redshift independent, we expect to see evolution 
in the Lx-X relation due to evolution in the Lx-M relation. 
The expectation for self-similar evolution in Lx at fixed mass 
is that Lx oc pdzf^^ for bolometric luminosities, but closer to 
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• N3oo0.10<z<0.18 {x5) 
N3oo0.1S<z<0.26{x5) 

♦ Njoo 0.26<z<0.30 (x5) 
O?^0.10<z<0.18 

□ X0.18<z<0.26 
O I 0.26<z<0.30 



Table 2. Scatter (o-|„i^|^) and redshift evolution (7) 



Fig. 4. — (Lx) vs. richness in three different richness bins. The empty 
points denote the matched filter richness A, and the solid points denote the 
original maxBCG richness A'2oo- The three richness bins are: O.IQ <z< 0.18 
(blue circles); 0.18 < z < 0.26 (green squares); 0.26 < z < 0.30 (red dia- 
monds). The normalization of {Lx)—N2oo has been multiplied by 5 for clar- 
ity. It is readily apparent that N200 has a strong redshift dependence (Rykoff 
et al. 2008b; Becker et al. 2007a), while A does not. 



pI for soft-band X-ray luminosities (Kaiser 1986). Here, pc 
is the critical density of the universe at redshift z- In a ACDM 
universe with n,„ = 0.25, the expected soft X-ray band evolu- 
tion is thus 7 w 1.05, so our results are also consistent with 
self-similar evolution. 

The striking difference in the evolution in the L^-richness 
relation between A and A^200 is due to the differences in how 
A^200 and A employ galaxy colors when estimating cluster rich- 
ness. For A^200, a galaxy contributes to the richness if and 
only if its color differs from the BCG color by no more than 
twice the intrinsic width of the ridgeline color width plus 
the galaxy's photometric error, added in quadrature. That is, 
A^200 weighs galaxies according to the probability distribution 
Ptop-haiic) given by; 



Ptop-hat{c) '■ 



1 if \c-cbcg\ < 
otherwise 



(19) 



where ct,„/ = 0.05 is the intrinsic width of the ridgeline. This 
is a top-hat distribution in observed color, but the width of the 
top-hat depends on the photometric error of the galaxy under 
consideration. Also, note that the center of the color box is not 
the model (c|z) quoted earlier, but rather the color of the BCG, 
which, as we show below, is a very significant difference. 

To illustrate how these differences in the color filter re- 
sults in differences in the evolution and scatter of A and N200, 
we have defined three additional richness measures with key 
properties bridging those of A and ^200- Including A and N200, 
the five richness measures considered here are 

1 . A: the matched filter richness with a variable aperture, 
as described above, with a gaussian color filter centered 

on (c|z). 

2. Xbcc- the matched filter richness using the same aper- 
ture as with A, but with the Gaussian model centered on 

cbcg- 

3. Ntop-hat,modeh a top-hat richness using the p,op-hai for- 
mulation above, centered around (c|z) as in Eqn. 11, 
measured on a fixed 1 /r' Mpc scale. 
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A 


0.78 


±0.02 


0.7 ±0.8 


^BCC 


0.82 


±0.02 


1.1±0.8 


^lop—hal , model 


0.80 


±0.02 


0.5±0.8 


N,op-hat,BCG 


0.99 


±0.02 


4.2 ±0.7 




0.95 


±0.02 


6.0 ±0.8 



"For the top 2000 clusters 



4. N,op-hai,BCG^ a top-hat richness using the p,op-ha, formu- 
lation above, centered around cbcc^ measured on a fixed 
1 h'"^ Mpc scale. This is similar to the maxBCG Nga!s 
richness, without the additional cut on the r-i color of 
the member galaxies. 

5. A^zoo, the original maxBCG richness estimator, mea- 



•als 



sured in a scaled radius rfgo 
tered on cbcc- 



with the color filter cen- 



Table 2 shows the scatter (in the top 2000 clusters) and evo- 
lution parameters for these various richness estimators. There 
are two key observations that we can make here. First, when 
using the top-hat richness, centering around the model color is 
significantly better than centering on the BCG color, in terms 
of decreasing both the scatter and evolution of the richness 
measure. Second, the smooth Gaussian filter centered on the 
BCG color works almost as well as the Gaussian filter cen- 
tered on the model color. This is a significant result, because 
it implies that not only are the resulting richnesses more ro- 
bust to moderate changes in the color filter parameters, but 
also the richness measure itself is also robust to photometric 
redshift errors. The reason for this robustness is simple: when 
using a color top-hat selection, using the correct color model 
is of paramount of importance since miscentering of the top- 
hat will lead to underestimates of the richness. In the matched 
filter framework, what is important is the relative galaxy den- 
sity of the cluster and field components, which can remain 
high even if the centering of the ridgeline color is slightly 
displaced. Thus, matched-filter richness estimates are much 
more robust to small changes in the parameters of the color 
filter than estimates based on simple color cuts. 

As an illustration of this effect. Figure 5 shows the 
color distribution of all galaxies brighter than 0.4L» 
within 1 /r' Mpc (solid black line) of the galaxy cluster 
SDSS J082026.8H-073650.1 at a redshift Zspec = 0.22. This 
cluster was selected because of the large discrepancy between 
A^200 and A. The color of the cluster BCG (solid red line) is 
significantly redder than the red sequence. The dotted verti- 
cal lines show the ±2cr„„ color cut, which does not include 
the peak of the red sequence. As a result, A^200 is signifi- 
cantly underestimated in this system. The blue curve shows 
the same galaxy distribution, but weighing each galaxy by its 
membership probability as estimated using the matched filter 
approach. As we can see, the matched filter effectively selects 
galaxies in the red sequence. 

We have demonstrated that the redshift evolution observed 
in the Lx -A^200 relation is primarily caused by using a top-hat 
filter centered on the color of the BCG. Why such a choice of 
color filter results in the strong evolution we observe for A'200 
is a complicated question, with at least two physical mecha- 
nisms contributing to the problem at comparable levels. First, 
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Fig. 5. — Color distribution of all the galaxies brighter than 0.4L, within 
1 ft-l Mpc (solid black line) of the galaxy cluster SDSS J082026.8+073650. 1 
at a redshift of Zspec = 0.22. The distribution is estimated using a Gaussian 
Kernel Density Estimator (KDE), with the size of the kernel selected to ad- 
equately sample the peak due to ridgeline galaxies. The cluster BCG color 
(solid red hne) is significantly redder than the red sequence (peak of the black 
distribution). The dotted vertical lines show the ±2cr„„ color cut. which does 
not include the bulk of the red sequence, and therefore Moo is significantly 
underestimated. The blue curve is the KDE estimate of the galaxy distribu- 
tion, except every galaxy has been weighted by its membership probability 
as estimated using the matched filter approach. We can see the match filter 
richness estimate selects principally ridgeline galaxies. 



there is the fact that even for a correctly centered top-hat filter, 
a ridgeline galaxy can fall outside the color cuts due to photo- 
metric errors. Since photometric errors increase with increas- 
ing redshift, a color cut such as that of N2m will progressively 
lose more galaxies as one moves the cluster to higher red- 
shift. Second, the E/SO ridgeline is not flat, but has a slight tilt 
(~ -0.04 mags/mag in g-r vs. i), such that brighter galaxies 
tend to be redder (e.g. Visvanathan & Sandage 1977; Renzini 
2006). By centering the color filter on the BCG - by def- 
inition the brightest and usually reddest cluster member - a 
small richness bias is introduced: clusters with brighter BCGs 
have a color filter centered redward of the average BCG color. 
Moreover, recent work by Hao et al. (in preparation) shows 
that with a proper account for photometric errors, the ridge- 
line tilt evolves with redshift, such that the ridgeline is steeper 
at z = 0.3 than at z = 0. 1 . Consequently, a BCG centered color 
cut becomes increasingly offset from the true mean ridgeline 
color as we increase redshift. Both of these systematics ef- 
fects occur with similar magnitude, and act in concert to pro- 
duce the observed evolution in A^2oo- We emphasize, however, 
that our matched filter richness estimator does not suffer from 
these systematic effects. 

Finally, we can now also explain why A^2oo exhibits stronger 
evolution than Ntop-hat,BCG- Recall that the aperture used to es- 
timate N200 is itself based on the richness measure Ngais, which 
is very similar to N,op-in„Mcc- Since A^,,,,, 

-hat,Bcc systemat- 
ically underestimates the richness for high redshift clusters 
due to the increasing tilt of the ridgeline, the aperture Tjqq , 
which scales with N,op-hai,BCG, is also underestimated. This 
compounds the effect of incorrect centering of the color box 
and results in stronger redshift evolution. 

7. SUMMARY AND CONCLUSIONS 

We have introduced a new matched filter richness estima- 
tor A whose correlation with mass is significantly tighter than 
that of A^2oo, the original maxBCG richness estimator. Rela- 
tive to other matched filter estimates, our estimator has two 



significant differences: 

1. The richness is measured on a scale that is optimized 
in the sense that it minimizes the scatter in Lx at fixed 
richness. 

2. In addition to a radial and magnitude filters, we include 
a color filter. This is of crucial importance for differen- 
tiating between member and non-member (projected) 
galaxies. 

The first these points is important since we have demonstrated 
that a poor choice of aperture increases the scatter in mass at 
fixed richness, while the latter minimizes the impact of pro- 
jection effects in richness estimates. Of the two, however, the 
improved treatment of galaxy color is the principal reason for 
the marked reduction of the scatter in the L^-richness rela- 
tion. 

Our procedure for aperture optimization can be easily gen- 
eralized to any mass tracer for which one can construct a cali- 
brating data set. In our particular case, we minimize the scat- 
ter in the Lx- X relation by measuring both Lx and A within 
an aperture RdX) = /?o(A/100)", and varying the model pa- 
rameters R() and A. Given the small richness range probed 
by our sample, we have not been able to isolate unique val- 
ues for Rq and a, finding instead a degeneracy region cor- 
responding to a fixed mean cluster radius for the clusters in 
the sample. Based on a priori assumptions about the radius- 
richness scaling, we have fixed a= 1/3, which yields a nor- 
malization of Ro = 1.27 ±0.03. We note, however, that the 
degeneracy region intersects a = at /?o ~ 850 /r' kpc. Al- 
though we expect that this fixed scaling will not be ideal at the 
rich group/poor cluster scale, it does work as a "first guess" 
richness and may be applicable to future cluster finding tech- 
niques. At this point, it is unclear whether the cluster radii 
selected by our technique reflects a true physical property of 
the maxBCG clusters, or whether it is driven primarily by a 
compromise between the the increase signal one expects at 
larger aperture, and the smaller noise one expects for smaller 
apertures. Regardless of the source, it is likely that similar 
aperture dependences exist for other mass tracers. 

We have also found our new richness estimator has scal- 
ing relations whose redshift evolution is much more mild than 
those exhibited by N2oq- This difference arises due to two ef- 
fects: first, A^200 uses a top-hat filter to select cluster galaxies, 
where as our matched filter estimator A uses gaussian color 
filters. Second, A^20{) centered its color filter at the color of the 
BCG, whereas A centered its color using an observationally 
calibrated color-redshift relation. The fact that the color of 
the BCG does not always agree with the observationally cali- 
brated redshift-color relation leads to a systematic difference 
between the two richness measures. Moreover, we also found 
that the sharp edges of the top-hat filter result in a richness es- 
timator that is very sensitive to the details of the color model, 
whereas our gaussian filter is much more robust to moderate 
changes in the model parameters. 

Restricting ourselves to the clean cluster sample, which ex- 
cludes cooling flow clusters and clusters with obvious fore- 
ground contamination in their X-ray luminosities, we have 
found that the scatter in the Lx-richness relation of the 
2000 richest clusters is is o-\nLx\x = 0.69 for A, compared to 
<^inLx\N.oo - 0-86 for A^200- Assuming a slope of w 1.6 for 
the Lx'-M relation (Stanek et al. 2006; Rykoff et al. 2008a; 
Vikhlinin et al. 2008a), these amount to a logarithmic scatter 
in the mass-richness relation of w 0.43 and 0.54 respectively. 
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While this is a very significant improvement, we expect that 
further tightening of the scatter in mass at fixed richness must 
be possible. For instance, assuming the intrinsic scatter in 
the richness-mass relation is Poisson, the logarithmic scatter 
possible for clusters with 20 galaxies or so should be roughly 
wO.2. 

Fortunately, there are still many options left for us to ex- 
plore in our quest to define optical mass proxies that can be 
competitive with other mass tracers in terms of the tightness 
of the correlation with mass. As we have defined it here, our 
richness estimates only makes use of the number of galaxies 
in the cluster One could, for instance, weigh our cluster rich- 
ness by other optical mass tracers such as the luminosity of the 
brightest cluster galaxy (Reyes et al. 2008), the abundance of 
baryons contained in the intracluster light (e.g. Gonzalez et al. 
2007), or other aspects of the cluster galaxy morphology (e.g. 
Bautz-Morgan Type, Bautz & Morgan 1970). In addition, 
one could weigh each galaxy's contribution to the richness by 
physical observables such as galaxy luminosity. Such a lumi- 
nosity weighted richness estimate would be a measure of the 
optical luminosity of the cluster as a whole, and might be bet- 
ter correlated with mass than richness itself (see also Lin et al. 
2003; Miller et al. 2005; Popesso et al. 2005). It is also likely 
that further improvements in richness estimates can arise with 
more accurate filters, a possibility we intend to explore in fu- 
ture work. Finally, we know that even with today's filters, part 
of the scatter we observe must be due to systematics effects 
such as failures of the cluster finding algorithm in identifying 
the correct center of a cluster, a problem which we have not 
addressed in this work. For the time being, the fact that naive 
theoretical expectations result in a scatter much lower than 
previously observed, and the fact that on our first attempt at 
defining a better richness estimator resulted in a highly signif- 
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APPENDIX 

A MAXIMUM LIKELIHOOD DERIVATION OF MATCHED FILTER RICHNESS ESTIMATORS 

Here, we derive equation 2 using a maximum likelihood approach, focusing first in the case where the filters m(x| A) are richness 
independent. The derivation is as follows: we pixelize the observable space x into infinitesimal pixels of "volume" Ax such that 
every pixel contains at most one galaxy. The likelihood that a given galaxy realization occurs is simply 

£cx J]^ (\u + b)AxY[(l-(\u + b)Ax) (Al) 

occupied empty 

where the first product is over all occupied pixels, while the second product is over all empty pixels. We have neglected terms 
that do not depend on A as they will not contribute to the maximum likelihood richness estimator. Setting dlnC/dX = 0, and 
taking the limit Ax ^ we find that the maximum likelihood richness estimator Xml is given by the solution to 

i = y^L_ (A2) 

where the sum is over all galaxies in the cluster field. This expression is identical to our naive richness estimator from equation 
2. 

We wish to briefly consider how richness dependent filters m(x|A) affect the maximum likelihood richness estimator. To do 
this, we go back to equation Al. Taking the derivative of the log-likelihood with respect to A and setting it to zero we find that 
the generalization of equation 2 is given by 

l+LxA|^ = y^il4^^^. (A3) 
J oX ^ Xu + b 

We emphasize that the integral over x and the derivative d/dX do not always commute. Indeed, consider the approach taken in 
this paper, in which u is taken to have a finite spatial extent of radius Rc, which is itself linked to richness via equation 3. The fact 
that M is zero for R > RAX) implies that the integration region for m is A dependent, and thus the integral and derivative signs do 
not commute. 

To assess the impact of a richness dependent profile, we consider here a simple isothermal filter u(R\X) = 1 /Rc, where RdX) is 
given by equation 3.'^ For this filter, we have then 

du du dRc ^^^^ 

'aA='a^aA=-"" ^^'^ 
where a is the slope of the radius-richness relation in equation 3. Our expression for the maximum likelihood richness estimator 
becomes 

^-^ Am -I- b 

«<R,(A) 

We see that the l-a prefactors cancel on both side of the equation, and thus our final expression for the maximum likelihood 
richness estimator for A is still given by equation 4, even though u is explicitly richness dependent. This suggests that our 
naive estimator is in general very close to the true maximum likelihood estimator We defer a detailed study of whether the 
more complicated structure of the true maximum likelihood richness estimator for more elaborate cluster profiles can lead to a 
significant improvement over the naive richness estimator from equation 4 to future work. 



The two dimensional density profile is, of course, S(S) oc \/R, but the radial probability density is u(R) = 2-kRT,{R) = 1 /Rc- 



